Canonical Least Squares Clustering on Sparse Medical Data

نویسنده

  • Igor Gitman
چکیده

We explore different applications of Canonical Least Squares (CLS) clustering on a corpus of sparse medical claims from a particular health insurance provider. We find that there are several reasons why most conclusions based on CLS clusters might be misleading, especially when the data is significantly sparse. We illustrate these findings by performing a number of synthetic experiments with the most focus on the sparsity issue since it has not been explored in the literature before. Based on the insights from the synthetic experiments we show how CLS clustering can be potentially applied to identify hospital peer-groups: hospitals that share similar operational characteristics. In addition we demonstrate that CLS clustering can be used to improve prediction results for the patients length of stay in the hospitals.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sparse Nonnegative Matrix Factorization for Clustering

Properties of Nonnegative Matrix Factorization (NMF) as a clustering method are studied by relating its formulation to other methods such as K-means clustering. We show how interpreting the objective function of K-means as that of a lower rank approximation with special constraints allows comparisons between the constraints of NMF and K-means and provides the insight that some constraints can b...

متن کامل

Learning Mixtures of Multi-Output Regression Models by Correlation Clustering for Multi-View Data

In many datasets, different parts of the data may have their own patterns of correlation, a structure that can be modeled as a mixture of local linear correlation models. The task of finding these mixtures is known as correlation clustering. In this work, we propose a linear correlation clustering method for datasets whose features are pre-divided into two views. The method, called Canonical Le...

متن کامل

Subspace Clustering Reloaded: Sparse vs. Dense Representations

State-of-the-art methods for learning unions of subspaces from a collection of data leverage sparsity to form representations of each vector in the dataset with respect to the remaining vectors in the dataset. The resulting sparse representations can be used to form a subspace affinity matrix to cluster the data into their respective subspaces. While sparsity-driven methods for subspace cluster...

متن کامل

Sparse Non-negative Matrix Factorizations via Alternating Non-negativity-constrained Least Squares

Many practical pattern recognition problems require non-negativity constraints. For example, pixels in digital images and chemical concentrations in bioinformatics are non-negative. Non-negative matrix factorization (NMF) is a useful technique in approximating these high dimensional data. Sparse NMFs are also useful when we need to control the degree of sparseness in non-negative basis vectors ...

متن کامل

A Least-Squares Unified View of PCA, LDA, CCA and Spectral Graph Methods

Over the last century Component Analysis (CA) methods such as Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), Canonical Correlation Analysis (CCA) and Spectral Clustering (SC) have been extensively used as a feature extraction step for modeling, classification, visualization, and clustering. This paper proposes a unified framework to formulate PCA, LDA, CCA, and SC as a ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017